Post-Learning Optimization of Tree Ensembles

نویسندگان

  • Claudio Lucchese
  • Franco Maria Nardini
  • Salvatore Orlando
  • Raffaele Perego
  • Fabrizio Silvestri
  • Salvatore Trani
چکیده

Learning to Rank (LtR) is the machine learning method of choice for producing highly effective ranking functions. However, efficiency and effectiveness are two competing forces and trading off effectiveness for meeting efficiency constraints typical of production systems is one of the most urgent issues. This extended abstract shortly summarizes the work in [4] proposing CLEaVER, a new framework for optimizing LtR models based on ensembles of regression trees. We summarize the results of a comprehensive evaluation showing that CLEaVER is able to prune up to 80% of the trees and provides an efficiency speed-up up to 2.6x without affecting the effectiveness of the model. Modern search engines are expected to return highly relevant results in a fractions of seconds to satisfy efficiency constraints. Learning-to-Rank (LtR) [1] methodologies are nowadays pervasively used as effective solutions to ranking problems. However, efficiency and effectiveness are intertwined concepts than often counteract each other. In this extended abstract we shortly summarize the work in [4] where we introduce CLEaVER, a framework developed on top of QuickRank [5], for the optimization of LtR models based on ensembles of regression trees after the learning phase has completed. Since document scoring cost by using a tree ensemble model is linear in its size, CLEaVER first removes a subset of the trees, and then fine-tunes the weights of the remaining ones according to a given quality measure. Results of a comprehensive evaluation using QuickScorer [2, 3], a state-of-the-art algorithm for efficient scoring, show that CLEaVER is able to improve the efficiency of a given ranking ensemble up to a 2.6x speed-up factor without affecting the effectiveness of the model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixtures of Bagged Markov Tree Ensembles

Key points: •Trees → efficient algorithms. •Mixture → improved modeling. There are 2 approaches to improve over a single Chow-Liu tree: Bias reduction, e.g. EM algorithm [1] •Learning the mixture is viewed as a global optimization problem aiming at maximizing the data likelihood. •There is a bias-variance trade-off associated with the number of terms. • It leads to a partition of the learning s...

متن کامل

Multi-target regression with rule ensembles

Methods for learning decision rules are being successfully applied to many problem domains, in particular when understanding and interpretation of the learned model is necessary. In many real life problems, we would like to predict multiple related (nominal or numeric) target attributes simultaneously. While several methods for learning rules that predict multiple targets at once exist, they ar...

متن کامل

MMDT: Multi-Objective Memetic Rule Learning from Decision Tree

In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...

متن کامل

Relevant Ensemble of Trees

Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is...

متن کامل

Decision Tree Ensembles in Biomedical Time-Series Classification

There are numerous classification methods developed in the field of machine learning. Some of these methods, such as artificial neural networks and support vector machines, are used extensively in biomedical time-series classification. Other methods have been used less often for no apparent reason. The aim of this work is to examine the applicability of decision tree ensembles as strong and pra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016